The Domain Name Service
|
Every computer on the Internet has a unique, 32 bit
number assigned to it. This number, called the IP address, is essential if you
want to be considered live on the Internet. If you don't have a number, you're
not on the net. Most people who surf the Internet have IP address dynamically
allocated to them when they dial up and log into their ISP's service. This IP
address changes every time they dial their Internet Service Provider. Of
course, this doesn't really matter to most people because they aren't running
servers on their machines. Only when you're running a commercial server do you
need to worry about the permanence of your IP address. After all, if your IP
address keeps changing every time, then it's going to be a wee bit difficult to
reach your page! These IP addresses, in the range zero to four billion, are a
mite difficult to remember. If you had to memorize a number like 198.24.3.103
for each and every site you like, then you'd run out of patience pretty fast.
It was to minimize this frustration and free up mental resources that the DNS
was created. A DNS or Domain Name Service, converts names to their
corresponding IP addresses. It is only because of the DNS that you can now
write www.microsoft.com instead of having to write it's whole IP address. It
just makes the Internet so much easier to use.
The DNS originally started out as a text file on a
central computer. It had two columns, one for IP addresses and the other for
the matching names. This approach was very inefficient. If I wanted to update
the listing, I had to go to the computer and tell the operator to manually
update the file. This was fine for as long as the Internet was only a couple of
hundred computers strong, but when you start talking about a couple of hundred
million computers, then having one file is simply not enough. Besides, there
was no redundancy in the design. If the central computer went down, then no one
could access the DNS. Even if the main DNS was functional, the load on it would
be phenomenal. Anything will grind to a halt if a couple of million
computers attempt to use it! So the people who were designing the Internet
decided to metamorphose the DNS into something that would be scaleable,
distributed and efficient; and the modern DNS was born.
The DNS distributes the load among a whole bunch of DNS
servers. Every ISP has it's own DNS, yet you can find the IP address of any
name through any DNS server. This process of converting names to numbers is
best describe through a specific example.
Type www.netscape.com. in your browser window.
Always read domain names from right to left rather than from left to right. The
first thing you should see is the '.' before '.com'. This period is entirely
optional and it signifies the root of the DNS. The modern DNS is organized a
bit like a tree. There is a root with certain main branches, called Top Level
Domains (TLD's) and these branches divide and further sub-divide, become more
intricate and web like. At the root are nine Root Servers. These servers know the
addresses of all the servers which handle individual TLD's like .com, .edu,
.mil, .gov .in, .uk etc. These servers are scattered all over the world and are
said to be authoritative for those TLD's. This means that they have the Start
of Authority or SOA for those domains. So, for example, the Root Servers know
that the .in domain is handled by a certain DNS server in India.
The Top Level Domain .com is handled by a server
somewhere in the U.S. of A.. The server handling .com knows the address of the
server in charge of the .netscape domain. This server belongs to Netscape. The
.netscape server has the SOA for the domain www. Nothing stops Netscape from
adding another level after www, making their address aaa.www.netscape.com. The
aaa part of the system will be handled by www. In fact, it's not compulsory to
have a www.something.com. Tradition is the only reason www is used as the
leftmost label. There are many TLD's, as many as there are countries. India's
TLD is .in, so all Indian sites end with a .in. For example, a friend of ours
has a site called www.mafatlal.co.in. Here www is a domain under mafatlal,
which is a domain under .co, which in turn is a domain under .in. The .co is
short for .com which means that this is a commercial site.
Domain names are
assigned by a body called the InterNIC (Headed by Jon Postel) in America. Other
countries have their own national NIC's; for example, the APNIC is for the Asia
Pacific. The structure of the domain names is going to start changing soon.
There are plans afoot to add more gTLD's (global Top Level Domains). The ones
in use are:-
.com - For Commercial sites.
.net - For sites emphasizing networking e.g. ISP's
.org - For non-profit organizations
.mil - For the U.S. military
.gov - For government sites
.edu - For educational sites
.int - For International organizations
Some of the proposed gTLD's are;-
.firm - For businesses
.store - For businesses offering on-line purchase
.web - For entities concentrating on Internet related activities
.arts - For sites emphasizing cultural and artistic content
.nom - For personal sites
.info - For sites providing information e.g. Libraries
.rec - For sites providing entertainment.
The process of granting domain names is also being
democratized with up to 28 new domain name handling agencies to be selected by
lottery from applicants world wide.
That's enough
about the layout of the DNS service, now to study the structure of the actual
DNS packets...
Deciphering
the DNS packet...
The DNS
listens to UDP packets on port 53. The reason UDP is used instead of TCP for
the DNS is because UDP is much faster than TCP. TCP's slow not because of any
inherent weaknesses, but because it attempts to be reliable and reliability is
very time consuming. UDP packets don't always reach the other end, but when
they do, they're really fast. The DNS will also respond to TCP packets if it
receives any. The original DNS program was called BIND, for Berkeley Internet
Name Daemon. It was written in the University of California (Berkeley Software
Division) and it's later versions are still used today.
When you
install your TCP/IP stack, you're asked for the IP addresses of your DNS
server. You're usually supposed to enter two, but even one will do. The second
DNS address is used as a backup in case the first server is down. When a
browser, or any other program which uses the WinSock, calls the function
gethostbyname(), a UDP packet is sent to your ISP's DNS on port 53. This packet
will ask the server if it knows the IP address of the site mentioned, e.g.
www.netscape.com. If another person had recently (usually less than a day ago)
asked for the IP address of the same, site, then the DNS sends you the cached
copy of the reply and the response is instantaneous. Things proceed at a more
leisurely pace of the response is not cached. If the DNS does not find a match
in its cache, then it sends a query to one of the Root Servers asking for the
address of the server which has the SOA (The Start of Authority) for the domain
.com. The Root Server will send your ISP's DNS the address of the server which
handles the .com domain. Our DNS server will now send a query to the .com
server, asking it if it knows the address of the authoritative DNS server for
.netscape. The .com DNS server will reply with the appropriate IP address. Now
our DNS server will ask Netscape's DNS server for the IP address of www. and if
the operation is successful, then you'll get the correct IP address.
This whole
process is called an Iterative Lookup. In a Recursive Lookup, our
ISP's DNS itself handles all the running around and all we do is wait for it to
send us the correct IP address. This complex processing is hidden from us
because all we do is call gethostbyname() and it does the rest.
To better
understand the DNS, we'll write a program which sends a raw DNS query packet to
a DNS server and tries to elicit a response.
dnsc.c
#include <windows.h>
#include <stdio.h>
unsigned char kk[1000],ll[1000];
void abc(char *p)
{
FILE *fp=fopen("z.txt","a+");
fprintf(fp,"%s\n",p);
fclose(fp);
}
int ii,dw,jj;
void abc1(unsigned char p)
{
FILE *fp=fopen("z.txt","a+");
fprintf(fp,"%x..%d..%c\n",p,p,p);
fclose(fp);
}
WNDCLASS a;HWND b;MSG c;char aa[200];SOCKET s;struct hostent h;
WSADATA ws;DWORD e;char bb[100];struct sockaddr_in sa;
long _stdcall zzz (HWND,UINT,WPARAM,LPARAM);
int _stdcall WinMain(HINSTANCE i,HINSTANCE j,char *k,int l)
{
a.lpszClassName="a1";
a.hInstance=i;
a.lpfnWndProc=zzz;
a.hbrBackground=GetStockObject(WHITE_BRUSH);
RegisterClass(&a);
b=CreateWindow("a1","time client",WS_OVERLAPPEDWINDOW,1,1,10,20,0,0,i,0);
ShowWindow(b,3);
while ( GetMessage(&c,0,0,0) )
DispatchMessage(&c);
return 1;
}
long _stdcall zzz (HWND w,UINT x,WPARAM y,LPARAM z)
{
if ( x == WM_LBUTTONDOWN)
{
e=WSAStartup(0x0101,&ws);
sprintf(aa,"WSAStartup e = %ld",e);
s = socket(PF_INET,SOCK_DGRAM,0);
sprintf(aa,"socket s = %ld",s);
sa.sin_family=AF_INET;
sa.sin_port=htons(53);
sa.sin_addr.s_addr = inet_addr("202.54.1.18");
kk[0]=0;
kk[1]=1;
kk[2]=1;
kk[3]=0;
kk[4]=0;
kk[5]=1;
kk[6]=0;
kk[7]=0;
kk[8]=0;
kk[9]=0;
kk[10]=0;
kk[11]=0;
kk[12]=3;
kk[13]='w';
kk[14]='w';
kk[15]='w';
kk[16]=8;
kk[17]='n';
kk[18]='e';
kk[19]='t';
kk[20]='s';
kk[21]='c';
kk[22]='a';
kk[23]='p';
kk[24]='e';
kk[25]=3;
kk[26]='c';
kk[27]='o';
kk[28]='m';
kk[29]=0;
kk[30]=0;
kk[31]=1; // 1 for A, 2 for NS, 5 for CNAME, 13 for HINFO
kk[32]=0;
kk[33]=1;
e=sendto(s,kk,34,0,(struct sockaddr *)&sa,sizeof(sa));
sprintf(aa,"SendTo %ld",e);
dw = sizeof(sa);
ii=recvfrom(s,ll,1000,0,(struct sockaddr *)&sa,&dw);
sprintf(aa,"Recv from %d",ii);
abc(aa);
for (jj=0;jj<>ii;jj++)
abc1(ll[jj]);
MessageBox(0,"Over","All",0);
}
if ( x == WM_DESTROY)
PostQuitMessage(0);
return DefWindowProc(w,x,y,z);
}
This program
will seem pretty familiar to anyone who's read our WinSock tutorials. We create a simple window
and when you click in the window, the callback zzz() is called. It's the callback
which does all the work. When the program ends, a MessageBox is displayed.
We've used Visual C++ 4.2 for all our programming needs. The files in the
project are dnsc.c and wsock32.lib
In the program
we create a socket which uses UDP and will work on the Internet. The port is
set to 53 and the address of the DNS server is set to 202.54.1.18, which is the
address of our ISP's DNS server.
After having
formed a socket, we initialize an array kk with some values. Have no fear, all
those numbers will be explained in due time. Since we're using UDP, we use the
functions sendto() and recvfrom() rather than simple send() and recv()'s. The
parameters of sendto are almost identical to send(). The third parameter, 34,
is the length of the array kk. Using recvfrom() we store the DNS response in an
array ll and then using our own function, save the contents of ll in a file
z.txt. The array kk contains the raw bytes which constitute a DNS query packet.
The first two bytes are the ID of the packet. It can be any two byte number and
we've decided to use 01 as our ID. The response for the DNS server will also
carry the same ID number to help you match different queries sent at the same
time to different answers received later.
The next two
bytes are the flags field. Each bit of these two bytes has a special meaning.
We'll discuss this in detail in just a while. The next two bytes stand for the
number of questions we wish to ask. Right now all we we're sending is one
simple query, so we put a 01 here.
The structure
for the DNS query and response packets is identical. Certain parts of the
packet are used when we are asking a question, other parts are used when we're
sending a response. The next two bytes i.e. kk[6] and kk[7] are supposed to
contain the number of responses we're sending. Since this is a query packet,
these bytes are set to zero. Then come two more bytes for the number of
Authority Records and another two bytes for the number of Additional Records.
Both these fields are set to zero. Now comes the domain name we want to find
the IP address of. Unlike in C where strings are NULL terminated, the strings
in the DNS packet follow a different format. Each label is separated by a
number which holds the length of the following label. So kk[12] holds 3, which
means that the next three bytes, www (kk[13], kk[14] and kk[15]) hold one
label. Kk[16] is 8 which means that the label following it is 8 bytes large and
so on. The string ends with a 0. The next two bytes are the Query Type which
can holds different values. So a 5 means we want to know the CNAME's or aliases
of the domain name www.netscape.com. 1 means we want that sites IP address. 13
is for more Host information and so on. The last two bytes are the Query Class
and these bytes specify the type of network we're using. Since we're using the
Internet these bytes will always be 01.
The Flags...
Flags |
QR |
OP Code |
AA |
TC |
RD |
RA |
Zeros |
Rcode |
No.of Bits |
1 |
4 |
1 |
1 |
1 |
1 |
3 |
4 |
We'll start
from the left and explain each field as we go.
The first flag
is QR, which is a 1 bit field. If the DNS packet is a Query, as ours is,
it is set to zero. If the packet is a response, it is set to 1. The next 4 bits
are the OP code. The normal value here is 0 which stands for Standard
Query. 1 stands for Inverse Query and 2 means we're asking for the status of
the server. The AA flag is set to zero if the server responding is
Authoritative for the domain in question, i.e. It has the Start of Authority
for that particular domain. The TC flag is turned on when the UDP packet
has been truncated. This means that the packet was larger than 512 bytes and
thus only the first 512 bytes have arrived. The RD bit is the only bit
turned on in our packet above. It stands for Recursion Desired. This means
we're asking the server to go to each server down the line and get us the
information. So the query will have to be handled by the server itself. After
all, we don't want to have go to the Root Server ourselves and then on to more
servers till we reach the authoritative one. If the bit is turned off and the
DNS does not have the SOA for the domain in question, then it will simply
return a list of servers for you to contact.
The RA
flag is related to the RD flag. It stands for Recursion Available and is
usually part of a DNS response from a server. It is set to one if the server
we've contacted supports recursion.
The next 3
bits are always zero. Keep them that way!
The last 4
bits constitute the Rcode or the return code. A 0 means no errors while
a 3 means a Name error occurred.
That was the
Query, now lets examine the answer we receive.
Hex |
Dec |
Char |
0 |
0 |
|
1 |
1 |
|
81 |
129 |
|
80 |
128 |
|
0 |
0 |
|
1 |
1 |
|
0 |
0 |
|
2 |
2 |
|
0 |
0 |
|
3 |
3 |
|
0 |
0 |
|
3 |
3 |
|
3 |
3 |
|
77 |
119 |
W |
77 |
119 |
W |
77 |
119 |
W |
8 |
8 |
|
6e |
110 |
N |
65 |
101 |
E |
74 |
116 |
T |
73 |
115 |
S |
63 |
99 |
C |
61 |
97 |
A |
70 |
112 |
P |
65 |
101 |
E |
3 |
3 |
|
63 |
99 |
C |
6f |
111 |
O |
6d |
109 |
M |
0 |
0 |
|
0 |
0 |
|
1 |
1 |
|
0 |
0 |
|
1 |
1 |
|
c0 |
192 |
|
c |
12 |
|
0 |
0 |
|
5 |
5 |
|
0 |
0 |
|
1 |
1 |
|
0 |
0 |
|
0 |
0 |
|
8 |
8 |
|
69 |
105 |
i |
0 |
0 |
|
14 |
20 |
|
5 |
5 |
|
77 |
119 |
w |
77 |
119 |
w |
77 |
119 |
w |
38 |
56 |
8 |
30 |
48 |
0 |
8 |
8 |
|
6e |
110 |
n |
65 |
101 |
e |
74 |
116 |
t |
73 |
115 |
s |
63 |
99 |
c |
61 |
97 |
a |
70 |
112 |
p |
65 |
101 |
e |
3 |
3 |
|
63 |
99 |
c |
6f |
111 |
o |
6d |
109 |
m |
|
|
|
0 |
0 |
|
|
|
|
c0 |
192 |
|
|
|
|
2e |
46 |
|
|
|
|
0 |
0 |
|
|
|
|
1 |
1 |
|
|
|
|
0 |
0 |
|
|
|
|
1 |
1 |
|
|
|
|
0 |
0 |
|
|
|
|
0 |
0 |
|
|
|
|
8 |
8 |
|
|
|
|
69 |
105 |
i |
|
|
|
0 |
0 |
|
|
|
|
4 |
4 |
|
|
|
|
c6 |
198 |
|
|
|
|
5f |
95 |
|
|
|
|
f9 |
249 |
|
|
|
|
4b |
75 |
K |
|
|
|
8 |
8 |
|
|
|
|
4e |
78 |
N |
|
|
|
45 |
69 |
E |
|
|
|
54 |
84 |
T |
|
|
|
53 |
83 |
S |
|
|
|
43 |
67 |
C |
|
|
|
41 |
65 |
A |
|
|
|
50 |
80 |
P |
|
|
|
45 |
69 |
E |
|
|
|
c0 |
192 |
|
|
|
|
3d |
61 |
= |
|
|
|
0 |
0 |
|
|
|
|
2 |
2 |
|
|
|
|
0 |
0 |
|
|
|
|
1 |
1 |
|
|
|
|
0 |
0 |
|
|
|
|
1 |
1 |
|
|
|
|
6b |
107 |
k |
|
|
|
2f |
47 |
/ |
|
|
|
0 |
0 |
|
|
|
|
5 |
5 |
|
|
|
|
2 |
2 |
|
4e |
78 |
N |
53 |
83 |
S |
c0 |
192 |
|
52 |
82 |
R |
c0 |
192 |
|
52 |
82 |
R |
0 |
0 |
|
2 |
2 |
|
0 |
0 |
|
1 |
1 |
|
0 |
0 |
|
1 |
1 |
|
6b |
107 |
k |
2f |
47 |
/ |
0 |
0 |
|
c |
12 |
|
2 |
2 |
|
4e |
78 |
N |
53 |
83 |
S |
3 |
3 |
|
4d |
77 |
M |
43 |
67 |
C |
49 |
73 |
I |
3 |
3 |
|
4e |
78 |
N |
45 |
69 |
E |
54 |
84 |
T |
0 |
0 |
|
c0 |
192 |
|
52 |
82 |
R |
0 |
0 |
|
2 |
2 |
|
0 |
0 |
|
1 |
1 |
|
0 |
0 |
|
1 |
1 |
|
6b |
107 |
k |
2f |
47 |
/ |
0 |
0 |
|
6 |
6 |
|
3 |
3 |
|
4e |
78 |
N |
53 |
83 |
S |
32 |
50 |
2 |
c0 |
192 |
|
52 |
82 |
R |
c0 |
192 |
|
67 |
103 |
g |
0 |
0 |
|
1 |
1 |
|
0 |
0 |
|
1 |
1 |
|
0 |
0 |
|
1 |
1 |
|
d8 |
216 |
Ø |
5d |
93 |
] |
0 |
0 |
|
4 |
4 |
|
|
|
|
c6 |
198 |
|
5f |
95 |
_ |
fb |
251 |
|
a |
10 |
|
c0 |
192 |
|
78 |
120 |
x |
0 |
0 |
|
1 |
1 |
|
0 |
0 |
|
1 |
1 |
|
0 |
0 |
|
2 |
2 |
|
66 |
102 |
f |
aa |
170 |
|
0 |
0 |
|
4 |
4 |
|
cc |
204 |
|
46 |
70 |
F |
80 |
128 |
|
1 |
1 |
|
c0 |
192 |
|
90 |
144 |
|
0 |
0 |
|
1 |
1 |
|
0 |
0 |
|
1 |
1 |
|
0 |
0 |
|
1 |
1 |
|
d8 |
216 |
|
5d |
93 |
] |
0 |
0 |
|
4 |
4 |
|
cd |
205 |
|
da |
218 |
|
9c |
156 |
|
2a |
42 |
* |
The first two bytes
are the ID bytes and because our query packets ID was 01, this packet too has
the ID set to 01.
The next two
bytes after that are 8180. These, as before, are the flags. If you'll refer to
the description of the flags above, you'll realize that this means that the DNS
server supports recursion.
Flags |
QR |
OP Code |
AA |
TC |
RD |
RA |
Zeros |
Rcode |
No.of Bits |
1 |
4 |
1 |
1 |
1 |
1 |
3 |
4 |
Value |
1 |
0000 |
0 |
0 |
1 |
1 |
000 |
0000 |
Right after
that are two more bytes which hold the number of questions; 01 in this case.
Then we have two more bytes which hold the number of answers the DNS has sent
us; 02 in this packet. Then we're informed about the number of authoritative
server for this query (03) and after that comes the number of additional
records provided (03).
Now comes the
entire name of the site we wished information about, with a final zero at the
end to signify the termination of the string.. The next two bytes after the
string hold the number 01, which means we've asked for an IP address and the
next two bytes also hold the number 01, which means the site is on the
Internet.
As you can
see, a large portion of the query has been duplicated in the response packet.
This is to verify that the information sent was accurate. It's only now that
the real answer begins... Now the DNS is almost always busy and some way had to
be found to speed up it's work. So the DNS incorporates some simple compression
tricks. The next byte of data is C0 which if written out in binary would look
like 11000000. When the first two bits are on, it means that DNS compression
has been turned on. The next byte is 0C (12 in decimal) which is an offset to
the data 3www8netscape3com0 which is 12 bytes away from the start of the
packet. By using a pointer to the name, the DNS avoids having to repeat
information which has already been mentioned. Neat!
After that we
have two bytes (the query type), set to 05 which mean that the data following
is the original site that www.netscape.com is a CNAME for. A CNAME or Canonical
name is simply an alias for a site. So, for example, CNN.com is a canonical
name for www.cnn.com. Right after that come two bytes for the query class which
is 01. Now come four bytes for the Time to Live (ttl). The Time to Live is the
length of time, in seconds, that the querying machine (You or the DNS) can
cache the response. The usual time is a day or two. Next we have two bytes for
the length of the data to follow, which is 14 bytes (20 bytes in decimals).
It's only now that we reach the meat of the mater, the actual name of the
server whose IP address we wish to know. We're told that www.netscape.com is a
canonical name for a server www80.netscape.com. This means that in actuality,
Netscape Inc. Has at least eight servers running at one time to handle the
load. When you type www.netscape.com, you're shunted to the server with the
least amount of people on it. A neat trick and one you wouldn't have discovered
without reading the DNS packet!
After that we
have C0 which as usual informs us that compression is on. The byte after that
is a pointer the text 5www808netscape3com. As before, the two 01's mean that
the packet is on the Internet using the IP protocol. After four bytes of the
Time to Live comes 04, which is the length of the data that follows. The next
four bytes hold the most important part of the packet; the IP address of
www80.netscape.com. The numbers C6.5F.F9.4B are in hex and if we convert them
to decimal, we get 198.95.249.75 So that's it ! The real IP address of
www80.netscape.com which is the real name of www.netscape.com. Just to double
check, ping the IP address 198.95.249.75 and see the name of the site. Try and
decipher the other parts of the message yourself. They all follow the same
format;
That's just about covers the DNS!
Back to Windows Socket Programming